Sentiment Detection with Character n-Grams
نویسندگان
چکیده
Automatic detection of the sentiment of a given text is a difficult but highly relevant task. Application areas range from financial news, where information about sentiments can be used to predict stock movements, to social media, where user recommendations can determine success or failure of a product. We have developed a methodology, based on character ngrams, to detect sentiments encoded in text. In the course of this paper we will present the founding idea and the algorithms as well as a usage scenario with an evaluation. We discuss the the obtained results in detail and a compare them with those of other popular sentiment detection methodologies.
منابع مشابه
A Comparison of Approaches for Sentiment Classification on Lithuanian Internet Comments
Despite many methods that effectively solve sentiment classification task for such widely used languages as English, there is no clear answer which methods are the most suitable for the languages that are substantially different. In this paper we attempt to solve Internet comments sentiment classification task for Lithuanian, using two classification approaches – knowledge-based and supervised ...
متن کاملOn the Impact of Sentiment and Emotion Based Features in Detecting Online Sexual Predators
According to previous work on pedophile psychology and cyberpedophilia, sentiments and emotions in texts could be a good clue to detect online sexual predation. In this paper, we have suggested a list of high-level features, including sentiment and emotion based ones, for detection of online sexual predation. In particular, since pedophiles are known to be emotionally unstable, we were interest...
متن کاملPAN 2017: Author Profiling - Gender and Language Variety Prediction
We present the results of gender and language variety identification performed on the tweet corpus prepared for the PAN 2017 Author profiling shared task. Our approach consists of tweet preprocessing, feature construction, feature weighting and classification model construction. We propose a Logistic regression classifier, where the main features are different types of character and word n-gram...
متن کاملEnhanced Twitter Sentiment Classification Using Contextual Information
The rise in popularity and ubiquity of Twitter has made sentiment analysis of tweets an important and well-covered area of research. However, the 140 character limit imposed on tweets makes it hard to use standard linguistic methods for sentiment classification. On the other hand, what tweets lack in structure they make up with sheer volume and rich metadata. This metadata includes geolocation,...
متن کاملLASSA: Emotion Detection via Information Fusion
DUE TO THE COMPLEXITY OF EMOTIONS IN SUICIDE NOTES AND THE SUBTLE NATURE OF SENTIMENTS, THIS STUDY PROPOSES A FUSION APPROACH TO TACKLE THE CHALLENGE OF SENTIMENT CLASSIFICATION IN SUICIDE NOTES: leveraging WordNet-based lexicons, manually created rules, character-based n-grams, and other linguistic features. Although our results are not satisfying, some valuable lessons are learned and promisi...
متن کامل